Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT
نویسندگان
چکیده
The continuously growing MT market faces the challenge of translating new languages, diverse genres, and different domains using a variety of available linguistic resources. As such, MT system adaptability has become a sought-after necessity. An adaptable statistical or Hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages (translation divergences). We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.
منابع مشابه
Title of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT
Title of dissertation: COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT Necip Fazıl Ayan, Doctor of Philosophy, 2005 Dissertation directed by: Professor Bonnie J. Dorr Department of Computer Science Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success ...
متن کاملBilingual Lexicon Construction Using Large Corpora
This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building bilingual lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of ltered linguistic feedback between term ...
متن کاملLinguistic Heuristics in Word Alignment
The IBM statistical machine translation (SMT) models [Brown et al.1993] have been extremely influential in computational linguistics in the past decade. The (arguably) most striking characteristic of the IBM-style SMT models is their total lack of inherent linguistic knowledge. The IBM models demonstrated how much one can do with pure statistical techniques. This has inspired a whole new genera...
متن کاملCombining Unsupervised and Supervised Alignments for MT: An Empirical Study
Word alignment plays a central role in statistical MT (SMT) since almost all SMT systems extract translation rules from word aligned parallel training data. While most SMT systems use unsupervised algorithms (e.g. GIZA++) for training word alignment, supervised methods, which exploit a small amount of human-aligned data, have become increasingly popular recently. This work empirically studies t...
متن کاملA Maximum Entropy Approach to Combining Word Alignments
This paper presents a new approach to combining outputs of existing word alignment systems. Each alignment link is represented with a set of feature functions extracted from linguistic features and input alignments. These features are used as the basis of alignment decisions made by a maximum entropy approach. The learning method has been evaluated on three language pairs, yielding significant ...
متن کامل